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Introduction 

This technical report is the union of two contributions to the discussion of the 
Read Paper Riemann manifold Langevin and Hamiltonian Monte Carlo methods 
(Calderhead and Girolami, 2010), presented in front of the Royal Statistical So- 
ciety on October 13 th 2010 and to appear in the Journal of the Royal Statistical 
Society Series B. 

The first comment establishes a parallel and possible interactions with Adap- 
tive Monte Carlo methods. The second comment exposes a detailed study of 
Riemannian Manifold Hamiltonian Monte Carlo (RMHMC) for a weakly identifi- 
able model presenting a strong ridge in its geometry. 

1 On Adaptive Monte Carlo - J. Cornebise and G. Peters 

The utility of RMMALA and RMHMC methodology is its ability to adapt Markov 
chain proposals to the current state. Many articles design Adaptive Monte Carlo 
(MC) algorithms to learn efficient MCMC proposals, such as the special case of 
controlled MCMC, Haario et al. (2001) which utilizes a historically estimated 
global covariance for a Random Walk Metropolis Hastings (RWMH) algorithm. 
Similarly, Atchade (2006) devised global adaptation in MALA. Surveys are pro- 
vided in Atchade et al. (2009), Andrieu and Thorns (2008) and Roberts and 
Rosenthal (2009). 
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When the proposal remains essentially unchanged regardless of the current 
Markov chain state, performance may be poor if the shape of the target distri- 
bution varies widely over the parameter space. A typical illustration involves the 
"banana-shaped" warped Gaussian (see comments by Cornebise and Bornn), iron- 
ically originally utilized to illustrate the strength of early Adaptive MC (Haario 
et al., 1999, Figure 1). This algorithm learned local geometry by estimating the 
covariance matrix based on a sliding window of past states. However, Haario 
et al. (2001, Appendix A) showed it could exhibit strong bias, perhaps connected 
to requirements of "diminishing adaptation" as studied in Andrieu and Moulines 
(2006). Recent locally adaptive algorithms satisfy this condition, e.g. State- 
dependent proposal scalings (Rosenthal, 2010, Section 3.4) fits a parametric family 
to the covariance as a function of the state, or the parameterized parameter space 
approach of Regional Adaptive Metropolis Algorithms (Roberts and Rosenthal, 
2009, Section 5). 

Riemannian approaches provide strong rationale for parameterizing the pro- 
posal covariance as a function of the state - without learning, when the FIM 
(or observed FIM) can be computed or estimated, (see comment by Doucet and 
Jacob). With unknown FIM, or to learn the optimal step size, it would be in- 
teresting to combine Riemannian Monte Carlo with adaption. A first step could 
involve a simplistic Riemann-inspired algorithm such as a centered RWMH via 
the (observed) FIM as the proposal covariance (Section 4.3.1 Marin and Robert, 
2007, as used in) - equivalent to one step of RMMALA without drift. 

An additional use of Riemannian MC could be within the MCMC step of 
Particle MCMC (Andrieu et al., 2010), where adaption was highly advantageous 
in the AdPMCMC algorithms of Peters et al. (2010). 

Another interesting extension involves considering the stochastic approxima- 
tion alternative approach, based on a curvature updating criterion of (Okabayashi 
and Geyer, 2010, Equation 10), for an adaptive line search. This was proposed as 
an alternative to MCMC-MLE of Geyer (1991) for complex dependence structures 
in exponential family models. In particular comparing properties of this curvature 
based condition based on local gradient information with adaptive RMMALA and 
RMHMC versions of the MCMC-MLE algorithms would be instructive. 

Additionally, one may consider how to extend Riemannian MC to trans- 
dimensional MC such as reversible jump (Richardson and Green, 1997), for which 
adaptive extensions are rare (Green and Hastie, 2009, Section 4.2). We wonder 
how a geometric approach may be extended to efficiently explore disjoint unions 
of model subspaces as in Nevat et al. (2009). 

Finally, an open question to the community: could such geometric tools be 
utilized in Approximate Bayesian Computation (Beaumont et al., 2009), e.g. to 
design the distance metric between summary statistics? 
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2 RMHMC for unidentifiable models - j. Comebise and 

L. Bornn 



In this comment we show how the proposed RMHMC method can be particularly 
useful in the case of strong geometric features such as ridges commonly occur- 
ring in nonidentifiable models. While it has been suggested to use tempering or 
adaptive methods to handle these ridges (Neal, 2001; Haario et al., 2001, e.g.), 
they remain a celebrated challenge for new Monte Carlo methods (Cornuet et al., 
2009). We suspect that RMHMC, by exploiting the geometry of the surface to help 
make intelligent moves along the ridge, is a brilliant advance for nonidentifiability 
sampling issues. 

Consider observations yi, . . . , y n ~ H(9\ + 0f , a^). The parameters 9\ and 9 2 
are non-identifiable without any additional information beyond the observations: 
any values such that 9i + 9\ = c for some constant c explain the data equally well. 
By imposing a prior distribution, 9\, 9% ~ A/"(0, cr|) , we create weak identifiability, 
namely decreased posterior probability for c far from zero. Figure 1 shows the 
prior, likelihood, and ridge-like posterior for the model. For this problem, we have 
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Figure 2 compares typical trajectories of both HMC and RMHMC, demonstrating 
the ability of RMHMC to follow the full length of the ridge. 

HMC and RMHMC also differ in sensitivity to the step size. As described by 
Neal (2010), HMC suffers from the presence of a critical step size above which 
the error explodes, accumulating at each leapfrog step. In contrast, RMHMC 
occasionally exhibits a sudden jump in the Hamiltonian at one specific leapfrog 
step, followed by well-behaving steps (as seen in Figure 2. (a)). This is due to the 
possible divergence of the fixed point iterations (FPI) in the generalized leapfrog 
equations 

p (r + |) = p( T ) - £ -V e H {0 (r) , p (r + |) } (16) 

for given momentum p(r), parameter 0(r) and step size e. Figure 3 shows the 
probability of (16) having a solution p(e/2) as a function of 0(0), and of the 
derivative at the fixed point being "small enough" for the FPI to converge; the 
well-known sufficient theoretical threshold on the derivative (Fletcher, 1987, see 
e.g.) is 1, but we conservatively chose 1.2 based on typical successful runs. When 
the finite number of FPI diverges, the Hamiltonian explodes; however, subsequent 
steps may still admit a fixed point, and hence behave normally. Unsurprisingly, 
this behavior is much more likely to occur for larger step sizes. 
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Figure 1: Prior, likelihood, and posterior for the warped bivariate Gaussian with 
n = 100 values generated from the likelihood with parameter settings ag = a y = 1. 
As the sample size increases and the prior becomes more diffuse, the posterior 
becomes less identifiable and the ridge in the posterior becomes stronger. 
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Leapfrog Trajectories 



Leapfrog Trajectories 





Evolution of the Hamiltonian 



Evolution of the Hamiltonian 
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(a) RMHMC 



(b) HMC 



Figure 2: Three typical consecutive trajectories of 20 leapfrog steps each, with 
step size of 0.1, for the RMHMC and HMC algorithm, chosen to highlight two 
acceptances (black) and one rejection (red), representative of the approximately 
65% acceptance ratio for both HMC and RMHMC. We see that RMHMC is able 
to track the contours of the density and reach to the furthest tails of the ridge, 
adapting to the local geometry, whereas the spherical moves of HMC oscillate 
back and forth across the ridge. 
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E = 0.10, o 8 =0.50 E . 1 .00, a e . 0.50 




(a) Probability of existence of fixed point 



E = 0.10, <J B =0.50 E . 1 .00, o B . 0.50 
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(b) Probability of existence of fixed point and of a derivative lower 
than 1.2 

Figure 3: Probability of a single iteration of the generalized leapfrog (a) admit- 
ting a fixed point p (|), (b) admitting a fixed point and having a small enough 
derivative at the fixed point for the FPI to converge. Both plotted as a function 
of starting point #(0). Shown for two step sizes e G {0.1, 1.0}, and two prior dis- 
tribution standard deviations oq G {0.5, 1.0}, i.e. varying levels of identifiability. 
The region of stability for FPI becomes much smaller as the step size increases 
and as identifiability decreases, even creating regions with null probability of con- 
vergence. 
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While the regions of low probability can strongly decrease the mixing of the 
algorithm, they do not affect the theoretical convergence ensured by the rejection 
step. Far from being a downside, understanding this behavior can bring much 
practical insight when choosing the step size - possibly adapting it on-the-fly, 
when RMHMC already provides a clever way to adaptively devise the direction 
of the moves. 
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